I've built an application and, like any lazy dev out there, I focused on the business logic, the project structure, the readability, comments, the dependency injection, the unit tests, you know... the code. My preference is to start from top to bottom, so I create more and more detailed implementations of interfaces while going down to the metal. The bottom of this chain is the repository, that class which handles database access, and I've spent little to understand or optimize that code. I mean, it's DB access, you read or you write stuff, how difficult can it be?

  When it was time to actually test it, the performance of the application was unexpectedly bad. I profiled it and I was getting reasonable percentages for different types of code, but it was all taking too long. And suddenly my colleague says "well, I tried a few things and now it works twice as fast". Excuse me?! You did WHAT?! I have been trying a few things too, and managed to do diddly squat! Give me that PR to see what you did! And... it was nothing I could see.

  He didn't change the code, he just added or altered the attributes decorating the properties of models. That pissed me off, because I had previously gone to the generated SQL with the SQL Profiler and it was all OK. So I executed my code and his code and recorded the SQL that came out:

  • was it the lazy loading? Nope. The number of instructions and their order was exactly the same
  • was it the explicit declaration of the names of indexes and foreign keys? Nope. Removing those didn't affect performance.
  • was it the ChangeTracker.LazyLoadingEnabled=false thing? Nope, I wasn't using child entities in a way that could be affected.
  • was there some other structure of the generated SQL? No. It was exactly the same SQL! Just my code was using thousands of CPU units and his was using none.
  • was it magic? Probably, because it made no sense whatsoever! Except...

Entity Framework generates simple SQL queries, but it doesn't execute them as you and I would. It constructs a string, then uses sp_executesql to run it. Something like this:

exec sp_executesql N'SELECT TOP(1) [p].[ID], [p].[TXT], [p].[LUP_TS]

FROM [sch].[table] AS [p]

WHERE [p].[ID] = @__p_0',N'@__p_0 nvarchar(64)',@__p_0='xxxx'

Do you see it? I didn't until I started to compare the same SQL in the two versions. And it was the type of the parameters! Note that the aptly named parameter @__p_0 is an NVARCHAR. The actual column in the database was VARCHAR! Meaning that the code above was unnecessarily always converting values in order to compare them. The waste of resources was staggering!

How do you declare the exact database type of your columns? Multiple ways. In my case there were three different problems:

  • no Unicode(false) attribute on the string columns - meaning EF expected the columns to be NVARCHAR
  • no Typename parameter in the Column attribute where the columns were NTEXT - meaning EF expected them to be NVARCHAR(Max)
    • I guess one could skip the Unicode thing and instead just specify the type name, but I haven't tested it
  • using MaxLength instead of StringLength - because even if their descriptions are very similar and MaxLength sounds like applying in more cases, it's StringLength that EF wants.

From 40-50ms per processing loop, it dropped to 21ms just by fixing these.

Long story short: parametrized SQL executed with sp_executesql hides a possible performance issue if the columns that you compare or extract have slightly different types than the one of the parameters.

Go figure. I hate Entity Framework!

Intro

  This post is about the System.InvalidOperationException: This SqlTransaction has completed; it is no longer usable. which may be because you shared your SqlConnection or you tried to SaveChanges twice and all of the other issues that you can google for. I was not so lucky. I spent a day and a half to understand what's going on and only with a help of another dev did I get close to the issue.

TL;DR;

I used a column with identity generation, but it wasn't also a primary key and EF sucks.

Details

  Imagine my scenario first: I wanted to use a database to assign a unique integer to a string. I was first searching for the entry in the DB and, if not found, I would just insert a new one. The SQL Server IDENTITY(1,1) setting would insure I got a new unique value for the inserted row. So the table would look like this:

CREATE TABLE STR_ID (
  STR NVARCHAR(64) PRIMARY KEY,
  ID INT IDENTITY(1,1)
}

Nothing fancy about this. Now for the C# part, using Entity Framework Core 6.

I created an entity class for it:

[Table("STR_ID")]
public class StrId {

  [Column("STR")]
  [Key]
  public string Text { get; set; }

  [Column("ID")]
  [DatabaseGenerated(DatabaseGeneratedOption.Identity)]
  public int Id { get; set; }

}

And then I proceeded to test it in the following way:

  • create a DbContext instance
  • search for a value by STR/Text in the proper DbSet
  • if it doesn't exist, insert a new row and SaveChanges
  • retrieve the generated id
  • dispose the context

I also ran this 20 times in parallel (well, as Tasks - a minor distinction, but it was using the thread pool).

The result was underwhelming. It would fail EVERY TIME, with either an exception about deadlocks or 

System.InvalidOperationException: This SqlTransaction has completed; it is no longer usable.
   at Microsoft.Data.SqlClient.SqlTransaction.ZombieCheck()
   at Microsoft.Data.SqlClient.SqlTransaction.Commit()

I did what every sane developer would do in this situation and bought myself a shotgun (we all know it's the most effective against zombies) then googled for other people having this issue. I mean, it would be common, right? You do some EF stuff in parallel and you get some errors.

No. This is happening in a parallelism scenario, but that's not the cause. Also, it's not about transactions. EF will wrap SaveChanges operations in a transaction and that is causing the error, but the transaction being completed is the issue and no, it's not your code!

I tried everything I could think of. I disabled the EF transaction and made my own, using all types of IsolationLevel, I tried EnableRetryOnFailure with hilarious results (I was monitoring the values inserted in the database with NOLOCK and they were going to 20, then back to 10, then 15, then back to 1 and it was taking ages trying to retry operations that apparently had dependencies to each other, only to almost all to fail after a long time). I even disabled connection pooling, which probably works, but would have made everything slow.

Solution

While I can't say what EXACTLY caused the problem (I would have to look into the Microsoft code and I don't feel like it now), the solution was ridiculously simple: just make the IDENTITY column a primary key instead:

CREATE TABLE STR_ID (
  ID INT PRIMARY KEY IDENTITY(1,1),
  STR NVARCHAR(64)
}

-- because this is what I am searching for
CREATE UNIQUE INDEX IX_STR_ID_STR ON STR_ID(STR) 
[Table("STR_ID")]
public class StrId {

  [Column("ID")]
  [Key]
  public int Id { get; set; }

  [Column("STR")]
  public string Text { get; set; }

}

I was about to use IsolationLevel.ReadUncommitted for the select or just set AutoTransactionsEnabled to false (which also would have solved the problem), when the other guy suggested I would use this solution. And I refused! It was dumb! Why the hell would that work? You dummy! And of course it worked. Why? Donno! The magical thinking in the design of EF strikes again and I am the dummy.

Conclusion

What happened is probably related to deadlocks, more specifically multiple threads trying to read/write/read again from a table and getting in each other's way. It probably has something to do with how IDENTITY columns need to lock the entire table, even if no one reads that row! But what it is certain to be is a bug: the database functionality for a primary key identity column and a unique indexed identity column is identical! And yet Entity Framework handles them very differently.

So, in conclusion:

  • yay! finally a technical post
  • this had nothing to do with how DbContexts get disposed (since in my actual scenario I was getting this from dependency injection and so I lost hours ruling that out)
  • the error about transactions was misleading, since the issue was what closed the transaction inside the Microsoft code not whatever you did
  • the advice of some of the AggregateExceptions up the stream (An exception has been raised that is likely due to a transient failure. Consider enabling transient error resiliency by adding 'EnableRetryOnFailure' to the 'UseSqlServer' call.) was even more misleading
  • the EF support for IDENTITY columns - well, it needs it because then how would it know not to attempt to save values in those columns - is also misleading, because it doesn't mean it's good support
  • while parallel access to the DB made the problem visible, it has little to do with parallelism 
  • EF knows how to handle PRIMARY KEYs so that's the solution
  • EF sucks!

I really hope this saves time for people in the same situation!

A funny feature that I've encountered recently. It's not something most people would find useful, but it helps tremendously with tracing and debugging what is going on. It's easy, just add .TagWith(someString) to your LINQ query and it will generate comments in SQL. More details here: Query tags.

I was trying to create an entity with several children entities and persist it to the database using Entity Framework. I was generating the entity, set its entry state to Added, saved the changes in the DbContext, everything right. However, in the database I had one parent entity and one child entity. I suspected it had something to do with the way I created the tables, the foreign keys, something related to the way I was declaring the connection between the entities in the model. It was none of that.

If you look at the way EF generates entities from database tables, it creates a two directional relationship from any foreign key: the parent entity gets an ICollection<Child> property to hold the children and the child entity a virtual property of type Parent to hold the parent. Moreover, the parent entity instantiates the collection in the constructor in the form of a HashSet<Child>. It doesn't have to be a HashSet, though. It works just as well if you overwrite it when you create the entity with something like a List. However, the HashSet approach tells something important of the way EF behaves when considering collections of child objects.

In my case, I was doing something like
var parent = new Parent { 
Children = Enumerable
.Repeat(new Child { SomeProperty = SomeValue }, 3)
.ToList()
};
Can you spot the problem? When I changed the code to
var parent = new Parent();
Enumerable
.Repeat(new Child { SomeProperty = SomeValue }, 3)
.ToList()
.ForEach(child => parent.Children.Add(child));
I was shocked to see that my Parent had only a list of Children with count 1. Why? Because Enumerable.Repeat takes the instance of the object you give it and repeats it:

Enumerable.Repeat(something,3).Distinct().Count() == 1 !

There was no problem that I was setting the Children collection to a List instead of a HashSet, but when saving the children, Entity Framework was considering the distinct instances of Child.

The solution here was to generate different instances of the same object, something like
var parent = new Parent {
Children = Enumerable.Range(0,3)
.Select(i => new Child { SomeProperty = SomeValue }, 3)
.ToList()
};
or, to make it more clear for this case:
var parent = new Parent {
Children = Enumerable
.Repeat(() => new Child { SomeProperty = SomeValue }, 3)
.Select(generator => generator())
.ToList()
};

Learning ASP.Net MVC series:
  1. Setup
  2. MVC Concepts
  3. Authentication
  4. Entity Framework Fundamentals
  5. Upgrading project to .NET Core 1.1
  6. Dependency Injection and Services

Previously on Learning ASP.Net MVC...


Started with the idea of a project that would use user configurable queries to do Google searches, store the information in the results and then perform various data analysis on them and display them based on what the user wants. However, I first implemented a Google authentication and then went to write some theoretical posts. Lastly, I've upgraded the project from .NET Core 1.0 to version 1.1.

Well, it took me a while to get here because I was at a crossroads. I like the idea of dependency injectable services to do data access. At the same time there is the entire Entity Framework tutorial path that kind of wants to strongly integrate EF with my projects. I mean, if I have a service that gives me the list of all items in the database and then I want to get only a few items, it would be bad design to filter the entire list. As such, I would have to write a different method that allows me to get the items based on some kind of filters. On the other hand, Entity Framework code looks just like that "give me all you have, filtered by this" which is then translated into an efficient query to the database. One possibility would be to have my service return IQueryable <T>, so I could also use the system to generate the database code on the fly.

The Design


I've decided on the service architecture, against an EF type IQueryable way, because I want to be able to replace that service with anything, including something that doesn't work with a database or something that doesn't know how to dynamically create queries. Also, the idea that the service methods will describe exactly what I want appeals to me more than avoiding a bit of duplicated code.

Another thing to define now is the method through which I will implement the dependency injection. Being the control freak that I am, I would go with installing my own library, something like SimpleInjector, and configure it myself and use it explicitly. However, ASP.Net Core has dependency injection included out of the box, so I will use that.

As defined, the project needs queries to pass on to Google and a storage service for the results. It needs data services to manage these entities, as well as a service to abstract Google itself. The data gathering operation itself cannot be a simple REST call, since it might take a while, it must be a background task. The data analysis as well. So we need a sort of job manager.

As per a good structured design, the data objects will be stored in a separate project, as well as the interfaces for the services we will be using.

Some code, please!


Well, start with the code of the project so far: GitHub and let's get coding.

Before finding a solution to actually run the background code in the context of ASP.Net, let's write it inside a class. I am going to add a folder called Jobs and add a class in it called QueryProcessor with a method ProcessQueries. The code will be self explanatory, I hope.
public void ProcessQueries()
{
var now = _timeService.Now;
var queries = _queryDataService.GetUnprocessed(now);
var contentItems = queries.AsParallel().WithDegreeOfParallelism(3)
.SelectMany(q => _contentService.Query(q.Text));
_contentDataService.Update(contentItems);
}

So we get the time - from a service, of course - and request the unprocessed queries for that time, then we extract the content items for each query, which then are updated in the database. The idea here is that, for the first time a query is defined or when the interval from the last time the query was processed, the query will be sent to the content service from which content items will be received. These items will be stored in the database.

Now, I've kept the code as concise as possible: there is no indication yet of any implementation detail and I've written as little code as I need to express my intention. Yet, what are all these services? What is a time service? what is a content service? Where are they defined? In order to enable dependency injection, we will populate all of these fields from the constructor of the query processor. Here is how the class would look in its entirety:
using ContentAggregator.Interfaces;
using System.Linq;

namespace ContentAggregator.Jobs
{
public class QueryProcessor
{
private readonly IContentDataService _contentDataService;
private readonly IContentService _contentService;
private readonly IQueryDataService _queryDataService;
private readonly ITimeService _timeService;

public QueryProcessor(ITimeService timeService, IQueryDataService queryDataService, IContentDataService contentDataService, IContentService contentService)
{
_timeService = timeService;
_queryDataService = queryDataService;
_contentDataService = contentDataService;
_contentService = contentService;
}

public void ProcessQueries()
{
var now = _timeService.Now;
var queries = _queryDataService.GetUnprocessed(now);
var contentItems = queries.AsParallel().WithDegreeOfParallelism(3)
.SelectMany(q => _contentService.Query(q.Text));
_contentDataService.Update(contentItems);
}
}
}

Note that the services are only defined as interfaces which we declare in a separate project called ContentAggregator.Interfaces, referred above in the usings block.

Let's ignore the job processor mechanism for a moment and just run ProcessQueries in a test method in the main controller. For this I will have to make dependency injection work and implement the interfaces. For brevity I will do so in the main project, although it would probably be a good idea to do it in a separate ContentAggregator.Implementations project. But let's not get ahead of ourselves. First make the code work, then arrange it all nice, in the refactoring phase.

Implementing the services


I will create mock services first, in order to test the code as it is, so the following implementations just do as little as possible while still following the interface signature.
public class ContentDataService : IContentDataService
{
private readonly static StringBuilder _sb;

static ContentDataService()
{
_sb = new StringBuilder();
}

public void Update(IEnumerable<ContentItem> contentItems)
{
foreach (var contentItem in contentItems)
{
_sb.AppendLine($"{contentItem.FinalUrl}:{contentItem.Title}");
}
}

public static string Output
{
get { return _sb.ToString(); }
}
}

public class ContentService : IContentService
{
private readonly ITimeService _timeService;

public ContentService(ITimeService timeService)
{
_timeService = timeService;
}

public IEnumerable<ContentItem> Query(string text)
{
yield return
new ContentItem
{
OriginalUrl = "http://original.url",
FinalUrl = "https://final.url",
Title = "Mock Title",
Description = "Mock Description",
CreationTime = _timeService.Now,
Time = new DateTime(2017, 03, 26),
ContentType = "text/html",
Error = null,
Content = "Mock Content"
};
}
}

public class QueryDataService : IQueryDataService
{
public IEnumerable<Query> GetUnprocessed(DateTime now)
{
yield return new Query
{
Text="Some query"
};
}
}

public class TimeService : ITimeService
{
public DateTime Now
{
get
{
return DateTime.UtcNow;
}
}
}

Now all I have to do is declare the binding between interface and implementation. The magic happens in ConfigureServices, in Startup.cs:
services.AddTransient<ITimeService, TimeService>();
services.AddTransient<IContentDataService, ContentDataService>();
services.AddTransient<IContentService, ContentService>();
services.AddTransient<IQueryDataService, QueryDataService>();

They are all transient, meaning that for each request of an implementation the system will just create a new instance. Another popular method is AddSingleton.

Using dependency injection


So, now I have to instantiate my query processor and run ProcessQueries.

One way is to set QueryProcessor as a service. I extract an interface, I add a new binding and then I give an interface as a parameter of my controller constructor:
[Authorize]
public class HomeController : Controller
{
private readonly IQueryProcessor _queryProcessor;

public HomeController(IQueryProcessor queryProcessor)
{
_queryProcessor = queryProcessor;
}

public IActionResult Index()
{
return View();
}

[HttpGet("/test")]
public string Test()
{
_queryProcessor.ProcessQueries();
return ContentDataService.Output;
}
}
In fact, I don't even have to declare an interface. I can just use services.AddTransient<QueryProcessor>(); in ConfigureServices and it works as a parameter to the controller.

But what if I want to use it directly, resolve it manually, without injecting it in the controller? One can use the injection of a IServiceProvider instead. Here is an example:
[Authorize]
public class HomeController : Controller
{
private readonly IServiceProvider _serviceProvider;

public HomeController(IServiceProvider serviceProvider)
{
_serviceProvider = serviceProvider;
}

public IActionResult Index()
{
return View();
}

[HttpGet("/test")]
public string Test()
{
var queryProcessor = _serviceProvider.GetService<QueryProcessor>();
queryProcessor.ProcessQueries();
return ContentDataService.Output;
}
}
Yet you still need to use services.Add... in ConfigureServices and inject the service provider in the constructor of the controller.

There is a way of doing it completely separately like this:
var serviceProvider = new ServiceCollection()
.AddTransient<ITimeService, TimeService>()
.AddTransient<IContentDataService, ContentDataService>()
.AddTransient<IContentService, ContentService>()
.AddTransient<IQueryDataService, QueryDataService>()
.AddTransient<QueryProcessor>()
.BuildServiceProvider();
var queryProcessor = serviceProvider.GetService<QueryProcessor>();

This would be the way to encapsulate the ASP.Net Dependency Injection in another object, maybe in a console application, but clearly it would be pointless in our application.

The complete source code after these modifications can be found here. Test the functionality by going to /test on your local server after you start the app.

This will be a short post to describe my own stupidity. I was testing the new Entity Framework Plain Old CLR Objects (POCO) support and so I made a small test to:
  • Clear the database
  • Insert new items in the database
  • Select the items from the database, with and without related items

Every time I got the entire object tree, with child collection and parent objects, thus making me think that in this implementation of EF, the need to use Include was gone, instead an Exclude method was needed in order to tell the framework to NOT LOAD related objects! Insane!
After looking everywhere to find the answer, I finally turned to profiling SQL only to see that the database was not accessed with related items, but only what I had asked for. Then I had my "I'm an idiot!" moment. I was using the same context and EF knew the entire hierarchy of objects because (duh!) I had just inserted it a few code lines above. Using different contexts solved the "problem" and only returned the requested objects, making Include a necessity again.

Well, you might laugh out loud when you read this, but it is an achievement for me. I wanted to update a table (table A) which had a connection to another table (table B) through a many to many table (table A2B). I had changed A2B to link A to a new table C. I wanted to reflect that change in my Entity Model.

So the first step was to right click on the model and do an "Update Model from Database". Then went through all the steps and clicked Finish. Now entity A has both a collection of Bs and a collection of Cs. I wanted to delete the B collection but, to my chagrin, no delete option was available.

The epiphany came when I managed to select the graphical link from entity A to entity B. The link has the delete option! Which, I have to admit, does make sense. So whenever you want to remove an association between entities... well, select the association and delete it, not the entity properties.